American Journal of Epidemiology
◐ Oxford University Press (OUP)
Preprints posted in the last 7 days, ranked by how well they match American Journal of Epidemiology's content profile, based on 57 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Kamulegeya, R.; Nabatanzi, R.; Semugenze, D.; Mugala, F.; Takuwa, M.; Nasinghe, E.; Musinguzi, D.; Namiiro, S.; Katumba, A.; Ssengooba, W.; Nakatumba-Nabende, J.; Kivunike, F. N.; Kateete, D. P.
Show abstract
BackgroundTuberculosis (TB) remains a leading cause of infectious disease mortality worldwide, and treatment failure contributes to ongoing transmission, drug resistance, and poor clinical outcomes. Artificial intelligence and machine learning approaches have attracted growing interest for predicting tuberculosis treatment outcomes, but the literature is heterogeneous and lacks a comprehensive synthesis. MethodsWe conducted a systematic review and meta-analysis of studies that developed or validated machine learning models to predict TB treatment failure. We searched PubMed/MEDLINE and Embase from January 2000 to October 2025. Studies were eligible if they developed, validated, or implemented an artificial intelligence or machine learning model for the prediction of TB treatment failure or a closely related poor outcome in patients receiving anti-TB treatment. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool. Random-effects meta-analysis was performed to pool area under the curve values, with subgroup analyses and meta-regression to explore heterogeneity. ResultsThirty-four studies were included in the systematic review, of which 19 reported area under the curve values suitable for meta-analysis (total participants, 100,790). Studies were published between 2014 and 2025, with 91% published from 2019 onward. Tree-based methods were the most common algorithm family (52.9%), and multimodal models integrating three or more data types were used in 41.2% of studies. The pooled area under the curve was 0.836 (95% confidence interval 0.799-0.868), with substantial heterogeneity (I{superscript 2} = 97.9%). In subgroup analyses, studies including HIV-positive participants showed lower discrimination (pooled area under the curve 0.748) compared to those excluding them (0.924). Only eight studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall, primarily due to methodological concerns in the analysis domain. Eggers test suggested publication bias (p = 0.024). Major evidence gaps included underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. ConclusionsMachine learning models for predicting TB treatment failure show promising discrimination but are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and methodological limitations, including inadequate validation, poor calibration assessment, and high risk of bias, limit confidence in current estimates. Future research should prioritize rigorous external validation, calibration assessment, and development in underrepresented populations, particularly HIV-affected and high-burden settings. Author SummaryTB kills over a million people annually. While curable, treatment failure remains common and drives ongoing transmission and drug resistance. Researchers increasingly use artificial intelligence and machine learning to predict which patients will fail treatment, but it is unclear if these models are ready for clinical use. We reviewed 34 studies including nearly 1.1 million participants from 22 countries. On average, models correctly distinguished patients who would fail treatment from those who would not 84% of the time, a performance generally considered good. However, this average hid enormous variation. Models developed in populations including HIV-positive people performed substantially worse, suggesting prediction is harder with HIV co-infection. Worryingly, only one study used high-quality methods; 97% had serious flaws in handling missing data, checking calibration, or testing in new populations. Only eight studies validated their models in different settings. To conclude, we found that machine learning is promising in predicting TB treatment failure, but it is not ready for clinical use. Researchers should prioritize validation in high-burden settings, include social determinants, and improve methodological rigor before these tools can help patients.
RAZAFIMAHATRATRA, S. L.; RASOLOHARIMANANA, L. T.; ANDRIAMARO, T. M.; RANAIVOMANANA, P.; SCHOENHALS, M.
Show abstract
Interpreting serological data remains challenging, particularly in low prevalence or cross reactive contexts, where antibody responses often show substantial overlap between exposed and unexposed individuals and may depart from normal distributional assumptions. Conventional cutoff based approaches often yield inconsistent or biased estimates of seroprevalence. Here, we present a decisional framework based on finite mixture models (FMMs) that enhances the robustness and interpretability of serological analyses. Beyond simply applying mixture models, our framework integrates multiple methodological innovations : (i) systematic comparison of Gaussian and skew normal mixture models to accommodate asymmetric antibody distributions; (ii) rigorous model selection using the Cramer von Mises test (p > 0.01) combined with a parsimonious score (APS) to prioritize models with well separated clusters; and (iii) hierarchical clustering of posterior probabilities to collapse latent components into biologically meaningful seronegative and seropositive groups. Applied to chikungunya virus (CHIKV) data from Bangladesh, the framework produced prevalence estimates consistent with ROC based methods while probabilistically identifying borderline cases. Validation on SARS CoV 2 and dengue datasets further demonstrated its generalizability: for SARS CoV 2, the approach identified up to five latent clusters with high sensitivity (up to 100%) and specificity (up to 100%), enabling discrimination by disease severity. For dengue, it revealed interpretable subgrouping consistent with background exposure and subclinical infection, despite limited confirmed cases. By integrating distributional flexibility, robust goodness of fit testing, and biologically guided cluster consolidation, this decisional FMM framework provides a reproducible and scalable method for serological interpretation across pathogens and epidemiological settings, addressing key limitations of threshold based classification.
Wang, J.; Morrison, J.
Show abstract
1Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal relationships between complex traits. Standard MR can be used to estimate an average causal effect at the population level, and typically assumes a linear exposure-outcome relationship. Recently, several methods for estimating nonlinear effects have been developed. However, many have been found to produce spurious empirical findings when subjected to negative control analyses. We propose that this poor performance may be attributable to heterogeneity in variant-exposure associations. We demonstrate that heterogeneous genetic effects on exposure lead to biased estimates, poor coverage, and inflated type I error in control function and stratification-based methods. In contrast, two-stage least squares (TSLS) methods are robust to such heterogeneity, but suffer from low precision and low power in some circumstances. We show that a statistical test for heterogeneity can be used to guide the choice of nonlinear MR methods. Using UK Biobank data, we reassess the causal effects of BMI, vitamin D, and alcohol consumption on blood pressure, lipid, C-reactive protein, and age (negative control). We find strong evidence of heterogeneity for all three exposures, and also recapitulate previous results that control function and stratification-based methods are prone to false positives. Finally, using nonparametric TSLS, we identify evidence of nonlinear causal effects of BMI on HDL cholesterol, triglycerides, and C-reactive protein; however, specific estimates of the shape of these relationships are imprecise. Altogether, our results suggest that common nonlinear MR methods are unreliable in the presence of realistic levels of heterogeneity, and that more methodological development is required before practically useful nonlinear MR is feasible.
Gada, L.; Afuleni, M. K.; Noble, M.; House, T.; Finnie, T.
Show abstract
Knowing the mortality rates associated with infection by a pathogen is essential for effective preparedness and response. Here, harnessing the flexibility of a Bayesian approach, we produce an estimate of the Infection Fatality Ratio (IFR) for A(H5N1) conditional on explicit assumptions, and quantify the uncertainty thereof. We also apply the method to first-wave COVID-19 data up to March 2020, demonstrating the estimates that could be obtained were the model available then. Our analysis uses World Development Indicators (WDI) from the World Bank, the A(H5N1) WHO confirmed cases and deaths tracker by country (2003-2024), and COVID-19 cases and deaths data from John Hopkins University (January and February 2020). Since infectious disease dynamics are typically influenced by local socio-economic factors rather than political borders, individual countries are placed within clusters of countries sharing similar WDIs relevant to respiratory viral diseases, with clusters derived by performing Hierarchical Clustering. To estimate the IFR, we fit a Negative Binomial Bayesian Hierarchical Model for A(H5N1) and COVID-19 separately. We explicitly modelled key unobserved parameters with informative priors from expert opinion and literature. By modelling underreporting, our analysis suggests lower fatality (15.3%) compared to WHO's Case Fatality Ratio estimate (54%) on lab-confirmed cases. However, credible intervals are wide ([0.5%, 64.2%] 95% CrI). Therefore, good preparedness for a potential A(H5N1) pandemic implies adopting scenario planning under our central estimate, as well as for IFRs as high as 70%. Our approach also returns a COVID-19 IFR estimate of 2.8% with [2.5%, 3.1%] 95% CrI which is consistent with literature.
Franzese, F.; Bergmann, M.; Burzynska, A.
Show abstract
Socioeconomic inequalities in health and well-being are a major public health concern, particularly in ageing populations. Education is a key determinant shaping multiple aspects of health outcomes. We used cross-sectional data from wave 9 of the German sample (n=4,148) of the Survey of Health, Ageing and Retirement in Europe (SHARE) to test whether formal education is associated with well-being in later adulthood, with health literacy, self-rated health, and preventive health behaviours as possible mediators. Our results showed that education was positively associated with greater well-being, but only via indirect pathways. Specifically, self-rated health, health literacy, and fruit and vegetable consumption mediated the relationship between education and well-being accounting for 54.7, 24.7, and 12.6 percent of the total effect, respectively. In addition, there were significant positive correlations between education and health literacy, as well as high-intensity physical activity, daily fruit and vegetable consumption, more preventive health check-ups, and less smoking. In contrast, alcohol consumption was more common among those with higher levels of education. All health behaviours and health literacy were correlated directly or indirectly (i.e., mediated by health) with well-being. These findings highlight the importance of examining indirect pathways linking education to well-being in later life. Interventions aimed at improving health literacy and promoting healthy behaviours may help reduce educational inequalities in quality of life among older adults.
Li, Y.; Cabral, H.; Tripodis, Y.; Ma, J.; Levy, D.; Joehanes, R.; Liu, C.; Lee, J.
Show abstract
Mediation analysis quantifies how an exposure affects an outcome through an intermediate variable. We extend mediation analysis to capture the cumulative effects of longitudinal predictors on longitudinal outcomes. Our proposed model examines how mediators transmit the effects of the current and previous exposure on the current outcome. We construct a least-squared estimator for cumulative indirect effect (CIE) and used three approaches (exact form, delta method, and bootstrap procedure) to estimate its standard error (SE). The estimator of CIE is unbiased with no unmeasured confounding and independent model errors between mediator model and outcome model at all time points, as shown in statistical inference and in simulations. While three SE estimates are numerically similar, bootstrap procedure is recommended due to its simplicity in implementation. We apply this method to Framingham Heart Study offspring cohort to assess if DNA methylation mediates the association of alcohol consumption with systolic blood pressure over two time points. We identify two CpGs (cg05130679 and cg05465916) as mediators and construct a composite DNA methylation score from 11 CpGs, which mediates for 39% of the cumulative effect. In conclusion, we propose an unbiased estimator for CIE. Future studies will investigate the missingness in mediators and outcomes.
Nilsson, A.; da Silva, M.; Le, H. T.; Haggstrom, C.; Wahlstrom, J.; Michaelsson, K.; Trolle Lagerros, Y.; Sandin, S.; Magnusson, P. K.; Fritz, J.; Stocks, T.
Show abstract
Excess body weight has been associated with increased cancer risk, but the role of weight change across adulthood remains unclear. We examined body weight trajectories from ages 17 to 60 and their associations with site-specific cancer incidence. Data were based on the ODDS study, a pooled, nationwide cohort study in Sweden, with data on weight spanning 1911 to 2020, and cancer follow-up through 2023. Weight trajectories were estimated with linear mixed effects models in individuals with at least three weight measurements. Cox regressions estimated hazard ratios for associations between weight trajectories and established and potentially obesity-related cancers. Fifth versus first quintile of weight change was associated with many cancers, most strongly with esophageal adenocarcinoma in men (HR 2.25; 95% CI 1.66-3.04), liver cancer in men (HR 2.67; 95% CI 2.15-3.33), endometrial cancer in women (HR 3.78; 95% CI 3.09-4.61), and pituitary tumors in both sexes (men: HR 3.13 [95% CI 2.13-4.61]; women: HR 2.13 [95% CI 1.41-3.22]). Associations varied by sex and age. Heavier weight at age 17 years and earlier obesity onset were also associated with higher cancer incidence. These findings highlight the importance of a life-course approach to weight management and support sex- and age-targeted cancer prevention strategies.
Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.
Show abstract
Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.
Rehman, N.; Guyatt, G.; JinJin, M.; Silva, L. K.; Gu, J.; Munir, M.; Sadagari, R.; Li, M.; Xie, D.; Rajkumar, S.; Lijiao, Y.; Najmabadi, E.; Dhanam, V.; Mertz, D.; Jones, A.
Show abstract
BackgroundSustained retention in care supports continuous access to antiretroviral therapy, routine clinical monitoring, and long-term viral suppression. ObjectiveTo compare the effectiveness of interventions for improving retention in care among people living with HIV (PLHIV). DesignSystematic review and network meta-analysis Data sourcesPubMed, Embase, CINAHL, PsycINFO, Web of Science, and the Cochrane Library from 1995 to December 2024. Eligibility criteriaRandomised controlled trials (RCTs) evaluating interventions to improve retention in care, viral load suppression, or quality of life (QoL) among PLHIV, compared with standard of care (SoC) or other interventions. Data extraction and synthesisPairs of reviewers independently screened studies, extracted data, and assessed risk of bias using ROBUST-RCT. We conducted a fixed-effect frequentist network meta-analysis and rated interventions categories relative to SoC based on effect estimates effects and the certainty of evidence.. Dichotomous outcomes were summarized as odds ratios (ORs) with 95% confidence intervals (CIs), and continuous outcomes as mean differences (MDs) with 95% CI. ResultsEighty-four trials enrolling 107 137 PLHIV evaluated 13 intervention categories. For retention in care, five interventions supported by moderate or high certainty evidence proved superior to SoC: multi-month dispensing (OR 2.02, 95% CI 1.32 to 3.09), task shifting (OR 1.94, 95% CI 1.42 to 2.66), differentiated service delivery (OR 1.47, 95% CI 1.22 to 1.76), behavioural counselling (OR 1.36, 95% CI 1.21 to 1.54), and supportive interventions (OR 1.31, 95% CI 1.11 to 1.55). For viral load suppression, two interventions supported by moderate or high certainty evidence proved superior to SoC: task shifting (OR 2.07, 95% CI 1.25 to 3.43) and behavioural counselling (OR 1.34, 95% CI 1.11 to 1.67). Across outcomes, no intervention demonstrated convincing superiority over other active interventions. ConclusionsAmong 13 intervention categories, only a subset provided moderate or high-certainty evidence of superiority to the standard of care, and no superiority to other interventions. Persistent evidence gaps for key populations, diverse settings, and long-term outcomes support the need for context-sensitive and patient-centred interventions. RegistrationPROSPERO CRD42024589177 Strengths and limitations of this study[tpltrtarr] This systematic review followed Cochrane methods and was reported in accordance with PRISMA-NMA guidelines. [tpltrtarr]The network meta-analysis integrated direct and indirect evidence to compare multiple intervention categories within a single framework. [tpltrtarr]Risk of bias and certainty of evidence were assessed using ROBUST-RCT and the GRADE approach for network meta-analysis, respectively. [tpltrtarr]Some networks were sparse, and limited representation of key populations and long-term follow-up constrained the strength and generalisability of inferences.
Yao, S.; Zimbalist, A.; Sheng, H.; Fiorica, P.; Cheng, R.; Medicino, L.; Omilian, A.; Zhu, Q.; Roh, J.; Laurent, C.; Lee, V.; Ergas, I.; Iribarren, C.; Rana, J.; Nguyen-Huynh, M.; Rillamas-Sun, E.; Hershman, D.; Ambrosone, C.; Kushi, L.; Greenlee, H.; Kwan, M.
Show abstract
Background: Few studies have examined racioethnic disparities in cardiovascular disease (CVD) in women after breast cancer treatment, who are at higher risk due to cardiotoxic cancer treatment. Methods: Based on the Pathways Heart Study of women with a history of breast cancer, this analysis examines the association between cardiometabolic risk factors (hypertension, diabetes, and dyslipidemia) and CVD events with self-reported race and ethnicity, as well as genetic similarity. Multivariable logistic and Cox proportional hazards regression models were used to test race and ethnicity and genetic similarity with prevalent and incident cardiometabolic risk factors and CVD events. Results: Of the 4,071 patients in this analysis, non-Hispanic Black (NHB), Asian, and Hispanic women were more likely to have prevalent and incident diabetes than non-Hispanic White (NHW) women. Analysis of genetic similarity revealed results consistent with self-reported race and ethnicity. For CVD risk, NHB women were more likely to develop heart failure and cardiomyopathy than NHW women. In contrast, Hispanic women were at lower risk of any incident CVD, serious CVD, arrhythmia, heart failure or cardiomyopathy, and ischemic heart disease, which was consistent with the associations found with Native American ancestry. Conclusions: This is the largest multi-ethnic study of disparities in CVD health in breast cancer survivors, demonstrating corroborating findings between self-reported race and ethnicity and genetic similarity. The results highlight disparities in cardiometabolic risk factors and CVD among breast cancer survivors that warrant more research and clinical attention in these distinct, high-risk populations.
Gantenberg, J. R.; La Joie, R.; Heston, M. B.; Ackley, S. F.
Show abstract
Qualitative models of Alzheimers pathology often posit that amyloid accumulation follows a sigmoid curve, indicating that the rate of deposition wanes over time. Longitudinal PET data now allow us to investigate amyloid accumulation trajectories with greater detail and over longer follow-up periods. We combine inferences from simulated amyloid trajectories, empirical PET data from the Alzheimers Disease Neuroimaging Initiative (ADNI), and the sampled iterative local approximation algorithm (SILA) to assess whether amyloid accumulation reaches a physiologic ceiling. We find that SILA reliably detects a ceiling, when present, across a range of simulated scenarios that impose a sigmoid shape. When fit to empirical data from ADNI, however, SILA does not appear to indicate the presence of a ceiling. Thus, we conclude that amyloid trajectories may not reach a physiologic ceiling during the stages of Alzheimers disease typically observed while patients remain under follow-up in cohort studies. Fits using SILA indicate that illustrative models of biomarker cascades, while useful tools for conceptualizing and interrogating pathologic processes, may not represent the shapes of amyloid trajectories accurately. Summary for General PublicAmyloid, a protein implicated in Alzheimers disease, is thought to reach a plateau in the brain, but methods that estimate how amyloid changes over time suggest it grows unabated. Gantenberg et al. use one such method and simulations to argue that amyloid does not reach a plateau during the typical course of Alzheimers.
Zimba, R.; Kelvin, E. A.; Kulkarni, S.; Carmona, J.; Avoundjian, T.; Emmert, C.; Peterson, M.; Irvine, M.; Nash, D.
Show abstract
Introduction Understanding provider preferences for the design of HIV treatment packages could enhance the implementation of programs to support the adoption of long-acting injectable antiretroviral therapy (LAI ART) by people living with HIV who are interested in initiating this treatment modality. Methods We recruited providers from New York City (NYC), Rockland, Putman, and Westchester County Ryan White Part A Medical Case Management (MCM) programs to complete a discrete choice experiment (DCE) containing twelve tasks with two alternatives and an opt-out option, with additional survey questions about implementation readiness and choice motivations. The alternatives included four attributes--Type of ART Medication (monthly or bimonthly LAI ART), Service Location and Mode, Support for Clients, and Rewards for Clients--with 2-4 levels each. We ran latent class multinomial logit analyses (LCA) with 1-5 classes to estimate preferences and explore hypothesis-free preference heterogeneity. We estimated attribute influence using relative importances and preferences using zero-centered part-worth utilities for each level. Results One hundred seventy-seven providers completed the survey (July 2022-January 2023). About half (52%) were 40-59 years old, 72% identified as women, and the plurality (41%) identified as Latino/a. We chose the two-group LCA solution. Bimonthly LAI ART was preferred over monthly LAI ART overall and in both groups. Group 1 (n=45) preferred more traditional adherence supports (e.g., injections at the clinic by appointment, injection appointment reminders) whereas Group 2 (n=132) preferred more client-centered supports (e.g., injections at home by appointment, free transportation to injection appointments if at a clinic). Both groups preferred higher monetary value gift cards for clients for every on-time injection. The top-ranking motivations indicated that participants prioritized patient convenience over job satisfaction and administrative or financial feasibility for the agency. The scores for all implementation measures indicate readiness to implement LAI ART in both groups. Conclusions Our implementation science-focused study suggests that providers of MCM services in NYC and surrounding counties are motivated to offer services to support clients' access and adherence to LAI ART. More work is needed to understand how programs have, in fact, integrated supports for LAI ART into their services.
O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.
Show abstract
Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.
Garcia Quesada, M.; Wallrafen-Sam, K.; Kiti, M. C.; Ahmed, F.; Aguolu, O. G.; Ahmed, N.; Omer, S. B.; Lopman, B. A.; Jenness, S. M.
Show abstract
Non-pharmaceutical interventions (NPIs) have been important for controlling SARS-CoV-2 transmission, particularly before and during initial vaccine rollout. During the pandemic, the US Centers for Disease Control and Prevention issued isolation and masking guidance in case of COVID-19-like illness, a positive SARS-CoV-2 test, or known exposure to SARS-CoV-2. However, the impact of this guidance on mitigating transmission in office workplaces is unclear. We used a network-based mathematical model to estimate the impact of this guidance on SARS-CoV-2 transmission among office workers and their communities. The model represented social contacts in the home, office, and community. We used data from the CorporateMix study to parametrize social contacts among office workers and calibrated the model to represent the COVID-19 epidemic in Georgia, USA from January 2021 through August 2022. In the reference scenario (58% adherence to guidance among office workers and the broader population), workplace transmission accounted for a small fraction of total infections. Reducing adherence among office workers to 0% increased workplace transmissions by 27.1% and increasing adherence to 75% reduced workplace transmission by 7.0%. Increasing adherence to 75% among office workers had minimal impact on symptomatic cases and deaths; increasing it among the broader population was more effective in reducing office worker cases and deaths. In our model, moderate adherence to recommended NPIs in workplaces was effective in reducing transmission, but increasing adherence had limited benefit given workplaces that have low contact intensity and hybrid work arrangements. These results underscore the public health benefits of community-wide adoption of recommended NPIs.
Mahmud, S.; Akter, M. S.; Ahamed, B.; Rahman, A. E.; El Arifeen, S.; Hossain, A. T.
Show abstract
Background Depressive symptoms among reproductive-aged women represent a major public health concern in low- and middle-income countries, yet systematic screening remains limited. In most population survey datasets, the low prevalence of depression results in severe class imbalance, which challenges conventional machine learning models. Therefore, we develop and evaluate a bagging-based ensemble machine learning framework to predict depressive symptoms among reproductive-aged women using highly imbalanced Bangladesh demographic and health survey (BDHS) 2022 data. Methods The sample comprised women aged 15-49 years drawn from BDHS 2022 data. Depressive symptoms were defined using the Patient Health Questionnaire (PHQ-9 [≥]10). Candidate predictors were drawn from sociodemographic, reproductive, nutritional, psychosocial, healthcare access, and environmental domains. Feature selection was performed using Elastic Net (EN), Random Forest (RF), and XGBoost model. Five classifiers (EN, RF, Support Vector Machine (SVM), K-nearest neighbors (KNN), and Gradient Boosting Machine (GBM)) were trained using both oversampling-based approaches and the proposed ensemble framework. Model performance was evaluated on an independent test set using accuracy, sensitivity, specificity, F1-score, and the normalized Matthews correlation coefficient (normMCC). Results Approximately 4.8% of women were identified with depressive symptoms. The proposed bagging ensemble framework consistently achieved more balanced predictive performance than oversampling-based models. Average normMCC improved from 0.540 (oversampling) to 0.557 (ensemble). RF and GBM ensembles demonstrated notable improvements in identifying depressive cases, while the EN ensemble achieved the highest overall performance and sensitivity. Threshold optimization yielded stable normMCC across models, indicating robust trade-offs between sensitivity and specificity. Conclusions Bagging-based ensemble learning provides a more robust and balanced approach than synthetic oversampling for predicting depressive symptoms in highly imbalanced population survey data. This approach has important implications for improving early identification and population-level mental health surveillance in resource-constrained settings.
Robert, A.; Goodfellow, L.; Pellis, L.; van Leeuwen, E.; Edmunds, W. J.; Quilty, B. J.; van Zandvoort, K.; Eggo, R. M.
Show abstract
BackgroundIn England, the burden of respiratory infections varies by ethnicity, contributing to health inequalities, but the role of additional demographic factors remains underexplored. We quantified how differences in social mixing and demographic characteristics between ethnic groups cause inequalities in transmission dynamics. MethodsWe analysed the association between the ethnicity and the number of contacts of 12,484 participants in the 2024-2025 Reconnect social contact survey, using a negative binomial regression model. We simulated respiratory pathogen epidemics using a compartmental model stratified by age, ethnicity, and contact levels, at a national level and in major cities in England. FindingsAfter adjusting for demographic variables, participants of Black and Mixed ethnicities had more contacts than those of White ethnicity (rate ratios (RR): 1.18 [95% Credible Interval (CI): 1.11-1.26], and 1.31 [95% CI: 1.14-1.52]). Participants of Asian ethnicity had fewer contacts (RR: 0.85 [95% CI: 0.79-0.91]). In national-level simulations, individuals of White ethnicity had the lowest attack rates due to demographic differences and mixing patterns. Local demographic structures changed simulated dynamics: attack rates in individuals of Black and Mixed ethnicities were approximately double those of White ethnicity in Birmingham, but less than 60% higher in Liverpool. InterpretationDemographic characteristics and mixing patterns create inequalities in transmission dynamics between ethnicities, while local demographic characteristics and pathogen infectiousness change the expected relative burden. To ensure mitigation strategies are effective and equitable, their evaluation must explicitly account for inequalities arising from local context. FundingMedical Research Council, National Institute for Health and Care Research, Wellcome Trust Research in context Evidence before this studyWe searched PubMed for population-based studies quantifying differences in respiratory infections between ethnic groups, up to 1 April 2026, with no language restrictions. Keywords included: (respiratory pathogens OR influenza OR COVID-19) AND (ethnic* OR race) AND (inequ*) AND (compartmental model OR incidence rate ratio OR hazard ratio). We excluded studies that focused on non-respiratory pathogens (e.g. looking at consequences of COVID-19 on incidence of other pathogens). A population-based cohort study showed that influenza infection risk was higher in South Asian, Black, and Mixed ethnic groups compared to White ethnicity in England. Another population-based cohort study highlighted that during the first wave of COVID-19 in England, the South Asian, Black, and Mixed ethnic groups were more likely to test positive and to be hospitalised than the White ethnic group. Census data in England showed that the distributions of age, household size, household income and employment status differed between ethnic groups, and the recent Reconnect social contact surveys highlighted the impact of each demographic factor on the participants number of contacts. Added value of this studyOur study shows that social contact patterns, mixing, and demographic structure all lead to unequal infection risk between ethnic groups in respiratory pathogen epidemics. Using the largest available social contact survey in England, we show that both the average number of contacts and the proportion of high-contact individuals varied by ethnic group, even after adjusting for participants demographics. These differences, together with mixing patterns and age structure, led to lower expected incidence among individuals of White ethnicity than in all other ethnic groups in simulated outbreaks. The level of inequality between ethnic groups changed when we used different values of pathogen transmissibility. Finally, as ethnic composition and population structure differ between cities in England, our results show differences in expected inequalities at a local level. Implications of all the available evidenceInequalities in infection risk between ethnic groups are context- and pathogen-dependent. They arise from both local population structure and contact patterns. Detailed information on mixing between groups and population structure is needed to accurately measure group-specific infection risk. These findings indicate that public health interventions based only on national-level estimates conceal regional variation in risk and may ultimately increase inequalities. Public health interventions need to be tailored to local contexts to be equitable and effective. Finally, our findings provide a foundation for understanding the progression from infection-risk inequalities to disparities in disease presentation and clinical outcomes.
Bui, L. V.; Nguyen, D. N.
Show abstract
Background. Vietnam's disease burden has shifted from communicable, maternal, neonatal, and nutritional (CMNN) causes to non-communicable diseases (NCDs), but the tempo, drivers, and regional positioning of this transition have not been jointly quantified. We characterised Vietnam's epidemiological transition 1990-2023 against ten Southeast-Asian (SEA) peers. Methods. Using Global Burden of Disease 2023 data, we computed joinpoint-regression AAPC with 95% CI (BIC-penalised, up to three break-points) for age-standardised DALY rates and cause-composition shares. We applied Das Gupta three-factor decomposition to 1990-2023 absolute DALY change (population-size, age-structure, age-specific-rate effects) and benchmarked Vietnam's NCD share against an SDI-conditional peer trajectory via leave-one-out quadratic regression. Premature mortality was quantified as WHO 30q70 under both broad NCD and strict SDG 3.4.1 definitions, using Chiang II life-table adjustment identically across all eleven countries. Findings. The CMNN age-standardised DALY rate fell from 13,295.9 to 4,022.1 per 100,000 (AAPC -4.63%/year; 95% CI -4.80 to -4.46); the NCD rate fell only from 21,688.2 to 19,282.8 (AAPC -0.37; -0.45 to -0.30). NCD share of total DALYs rose from 52.99% to 70.67% (+17.67 pp; AAPC +1.09). Vietnam ranked fourth of eleven SEA countries in 2023 (up from sixth in 1990) and sat 5.3% above the SDI-expected trajectory. Das Gupta decomposition attributed the +10.63 million NCD DALY increase to population growth (+6.26 M) and ageing (+6.08 M); rate change removed only 1.71 M. Premature NCD mortality fell from 25.02% to 21.80% (broad, 12.9% reduction) and from 22.17% to 19.50% (SDG 3.4.1, 12.0%; Vietnam sixth of eleven) - far short of the SDG 3.4 one-third-reduction target. Interpretation. Vietnam has entered a disability- and ageing-dominated NCD phase. Meeting SDG 3.4 by 2030 requires population-scale primary prevention sized to demographic momentum.
Hassell, N.; Marcenac, P.; Bationo, C. S.; Hirve, S.; Tempia, S.; Rolfes, M. A.; Duca, L. M.; Hammond, A.; Wijesinghe, P. R.; Heraud, J.-M.; Pereyaslov, D.; Zhang, W.; Kondor, R. J.; Azziz-Baumgartner, E.
Show abstract
Introduction: Modeling when influenza epidemics typically occur can help countries optimize surveillance, time clinical and public health interventions, and reduce the burden of influenza. Methods: We used influenza virus detections reported during 2011-2024 by 180 countries to the Global Influenza Surveillance and Response System, excluding COVID-19 pandemic impacted years (2020-2023). We analyzed data by calendar year (week 1-52) or shifted year (week 30-29) time windows, based on when most influenza detections occurred in each country. For countries with sufficient data, we computed generalized additive models (GAMs) of each country's weekly influenza-positive tests to smooth and impute time series distributions. From these GAMs, we calculated each country's normalized weekly influenza burden. Country-specific normalized time series were grouped using hierarchical k-means clustering reducing the Euclidean distance between time series within clusters. We calculated cluster-specific GAMs to estimate average seasonal timing. Countries without sufficient data were assigned to a cluster based on population-weighted latitudinal distance to a cluster's mean latitude. Results: We identified five clusters, or epidemic zones, from 111 countries with sufficient data. The influenza burden in epidemic zones A and B was consistent with a northern hemisphere pattern, with most influenza detections occurring during October-April (A) and September-March (B), while epidemic zones D and E were characterized by southern hemisphere-like seasonal timing, with most influenza burden occurring during May-November. Epidemic zone C had most influenza burden occurring during September-March; most countries assigned to this cluster were in the tropics. Conclusion: Epidemic zones may serve as a useful tool to strengthen and optimize influenza surveillance for global health decision-making (e.g., during vaccine strain composition discussions) and to guide country preparedness efforts for seasonal influenza epidemics, including the timing of enhanced surveillance, as well as the procurement and delivery of vaccines and antivirals.
Vliegenthart-Jongbloed, K. J.; Bunea, O.-M.; Fijołek, F.; Razzolini, I. P.; Barber, T. J.; Bernardino, J. I.; Nozza, S.; Psomas, C. K.; De Scheerder, M.-A.; Vasylyev, M.; Voit, F. M.; Jordans, C. C. E.; Willemsen, R.; van Wingerden, M. D.; Bienkowski, C.; Miron, V. D.; Felder, A.-K.; Hanssen, B.; Hontelez, J.; Li, Y.; Stutterheim, S.; Skrzat, A.; Sandulescu, O.; Rokx, C.; #aware.hiv Europe,
Show abstract
IntroductionAcross Europe, many people with HIV are diagnosed late despite repeated contact with hospital services for HIV indicator conditions. These conditions flag a possible underlying HIV infection for which HIV testing is recommended. They provide an opportunity to identify people with HIV, yet implementation of indicator condition based testing remains insufficient in hospital practice. The #aware.hiv Europe study was developed to address this gap by embedding HIV teams into routine care to normalise HIV testing. Methods and analysis#aware.hiv Europe is a stepped-wedge cluster randomised trial in 30 hospitals across ten European countries. Five clusters of 6 hospitals each will sequentially transition from control to implementation periods when local HIV teams led by an infectious diseases specialist will be installed. Intervention activities include hospital-wide peer audit and feedback on missed testing opportunities, targeted education, stigma reduction activities, and strengthening of linkage to HIV prevention and care. Patients with predefined HIV indicator conditions are identified using International Classification of Diseases, 10th Revision (ICD-10) diagnosis codes, confirmed through manual review. The primary outcome is the change in HIV testing rate among patients with confirmed HIV indicator conditions. Secondary outcomes include HIV case detection, cascades of diagnosis, care and prevention, variation in testing practices, healthcare professional knowledge and stigma, and implementation outcomes. Analyses will use mixed effects regression models accounting for clustering and time within the stepped-wedge design. Ethics and disseminationThe study has ethical approval in all hospitals to use routinely collected clinical data under exemption from informed consent for patient level data. Results will be disseminated through peer reviewed publications, conferences, and collaboration with clinical and community partners with the goal to inform HIV testing policies. Trial registrationClinicalTrials.gov NCT06900829. https://clinicaltrials.gov/study/NCT06900829 Strengths and limitations of this study+ Large, multinational, real-world, stepped-wedge, cluster randomized trial design. + Primary outcome derived from routinely collected clinical data, using a GDPR- and GCP-compliant approach with exemption from informed consent. + Hospital-wide intervention targeting care professionals, delivered through proactive expert HIV teams across departments powered to conclude on hard HIV care cascade clinical endpoints and stigma reducing interventions. + Implementation science design informed by established frameworks (CFIR and RE-AIM) to strengthen cross-continental generalisability. - Variation in healthcare systems and baseline testing practices across countries may contribute to heterogeneity in implementation and outcomes. - Despite standardised SOPs, local clinical judgement influences the assessment of HIV indicator conditions.
ENCISO DURAND, J. C.; Silva-Santisteban, A. A.; Reyes-Diaz, M.; Huicho, L.; Caceres, C. F.; LAMIS-2018,
Show abstract
Objectives: In Latin America, up-to-date information to monitor UNAIDS 95-95-95 HIV targets in key populations, such as men who have sex with men, is limited. Elsewhere, structural homophobia restricts access to ART. Conceptual frameworks suggest that intersecting forms of violence and discrimination may negatively influence HIV care outcomes through psychosocial and structural pathways, although empirical evidence remains limited. The study aimed to assess whether sexual orientation outness and recent homophobic violence are associated with not being on ART among Latin American MSM living with HIV. Methods: This cross-sectional study is a secondary analysis of data from LAMIS-2018, including 7,609 MSM aged 18+ with an HIV diagnosis [≥]1 year prior from 18 Latin American countries. Participants self-reported ART status, sociodemographic characteristics, homophobic violence, and sexual orientation outness. Bivariate and multivariate logistic regressions identified those factors associated with not being on ART. Results: Nine percent of MSM with HIV were not on ART, 18% reported low sexual orientation outness, and 27% experienced homophobic violence, especially in Andean and Central American countries. Not being on ART was associated with recent homophobic violence (aPR=1.25), low outness (aPR=1.22), unemployment (aPR=1.27), and residence in the Andean subregion (aPR=1.87), Mexico (aPR=1.28), or the Southern Cone (aPR=1.45) versus Brazil. Protective factors included being older (25-39: aPR=0.72; >39: aPR=0.49), living in large cities (aPR=0.72), having a stable partner (aPR=0.78), and university education (aPR=0.74). Conclusions: Recent homophobic violence and low sexual orientation outness were associated with not being on ART among MSM in Latin America. While access varies across countries, structural factors such as stigma and violence may limit engagement in care. Addressing these barriers alongside strengthening health systems may be key to improving ART uptake and advancing progress toward the 95-95-95 targets.